Research questions

In general, a research question’s nature depends on the goal and type of the pursued analysis (See fig 3.9 in Francom 2024):

Table 1: Overview of analysis types
Type Aims Approach Methods Evaluation
Exploratory Explore: gain insight Inductive, data-driven, and iterative Descriptive, pattern detection with machine learning (unsupervised) Associative
Predictive Predict: validate associations Semi-deductive, data-/ theory-driven, and iterative Predictive modeling with machine learning (supervised) Model performance, feature importance, and associative
Inferential Explain: test hypotheses Deductive, theory-driven, and non-iterative Hypothesis testing with statistical tests Causal

EXPLORATORY Research questions

Question 1.1

Is there a pattern in the WBG project document corpus1 that shows non random variation in the incidence of certain words, phrases, or policy concepts2 over time?

Hypothesis

The hypothesis being tested here is that the WBG project document corpus shows a non-random variation in the incidence of certain policy concepts over time.

The launch of a “policy slogan” carries intrinsic motivations to shift the PDO in a certain direction.

  • This question will be handled in a data-driven way, i.e. starting from patterns observed in the text data and not from predetermined ideas.

Question 1.2

Could the WDR3 publications “explain” or at least have a correlation to the recurrence over time of said concepts?

Hypothesis

The “alternative” hypothesis being tested here is that the WDR has a “traction effect” on the PDO of the following FYs.

Question 1.3

Since the WBG project document corpus data are very incomplete when it comes to sector and theme tagging: is it possible to overcome the insufficient data completion using TOPIC MODELING?

Hypothesis

The hypothesis being tested here is that some ML techniques can help improving the quality of the “document data collection”, e.g. the poor and incomplete sector/theme tagging of the WBG project documents.

  • Note that for this purpose the available dataset (~ 20 fiscal years worth of project PDOs descriptions) has been splitted into a training + validation + test sets.

For the moment, the study’s aim is mainly to EXPLORE (e.g., trends over time in phrases occurrence), and possibly to PREDICT (e.g., use ML to enhance the quality of metadata variables). Possible follow-up, also depending on the results of the previous exploratory questions.

EXPLANATORY Research questions

PREDICTIVE Research questions

References

Francom, Jerid. 2024. An Introduction to Quantitative Text Analysis for Linguistics: Reproducible Research Using R. 1st ed. London: Routledge. https://doi.org/10.4324/9781003393764.

Footnotes

  1. WBG project document observed in this case are Project Development Objectives (PDO) descriptive short texts.↩︎

  2. Concepts encompasse “policy focus”, “sector”, “strategy” or “emerging priority” in the arena of funding for development ….↩︎

  3. WDRs (World Development Reports) are the flagship reports of the World Bank group that have been published annually since 1978.↩︎